Method for obtaining a unified information graph from multiple information resouces

ABSTRACT

A method for dynamically obtaining a unified classification information graph which provides a navigation system for a user to access sought information. The method includes providing multiple information resources that include a hierarchy of categories that are associated, each, with a category. Leaf categories in the hierarchy are connected to information pages. The method further provides generating a unified classification information graph by using the hierarchy of categories and the categories of the multiple information resources. The unified classification graph includes a hierarchy of unified categories. Leaf unified categories in the hierarchy are connected to information pages, whereby, information pages accessible through the hierarchy of the multiple information resources are also accessible through the hierarchy of the unified classification information graph.

FIELD OF THE INVENTION

[0001] This invention relates to data retrieval and learning systems.

BACKGROUND OF THE INVENTION

[0002] In the context of distributed information systems (e.g. theInternet), there is a need to provide end users with a centralizedaccess and search service to information residing in multipleheterogeneous on-line catalogs. These on-line catalogs should be viewedby the users as if they were using the very same access method,information classification and nomenclature. This concept is called“information integration” and is the subject of several research anddevelopment efforts. Among them are:

[0003] Stanford University Knowledge Systems Laboratory (KSL) OntologyServer Projects.

[0004] Microelectronics and Computer Technology Corporation(MCC)—InfoSleuth Project (MCC, Austin, Tex.).

[0005] The main problems associated with information integration includedealing with the different conceptualization systems and selectingresources.

[0006] Dealing with different conceptualization systems includesproviding access to relevant information that is accessible throughdifferent classification methods and described using non-identicalnomenclatures. This requires bridging the gap between the differentconceptualization systems—the one used by the user to describe his queryand those used by each of different information resources. Theseconceptualization differences range from different classificationmethods to different nomenclature. For example, consider a usersearching for “RS232 Cable for Printer” which is listed in one on-linecatalog under the name “RS232 cable” in the sub-section called“Accessories” in the super-section called “Printers” and in anotheron-line-catalog under the name “Printer cable” in the section “Hardwareaccessories.” This is a very tough task, since it involves theformalization of “knowledge.”

[0007] Dealing with resource selection includes deciding which one ofthe available information resources is relevant for a specificinformation request. For example, there is no point in accessingresources providing information about restaurants when the user islooking for an automobile. In the domain level, this is an easy task.However, in larger arrays of information resources from similar domains,the problem becomes harder.

[0008] The research projects listed above deal with different aspects ofthese problems and make different assumptions about the environment.However, prior to the present invention, there have been nogeneral-purpose information integration systems. There are two mainreasons for this:

[0009] 1. There are no automatic mechanisms to “connect” to newinformation resources.

[0010] Current solutions to the task of connecting to informationresources are based on the assumption that “someone”—either theinformation requester or the information provider—provides aninformation source “wrapper” that enables “smooth” integration to thedata.

[0011] 2. There was no way to automatically create a large-scaleconceptualization system. A current solution to the problem of creatinga common unified conceptualization system is a manual solution providedby the Knowledge System Laboratory (KSL) at Stanford University. The KSLstaff has developed a set of tools and services to support the processof manually building and achieving consensus on a common sharedconceptualization system (termed “Ontology”).

[0012] It is only natural, then, that the lack of a real worldconceptualization system adversely affects both the quality of theinformation being retrieved—recall and precision—and the quality of theuser-computer interaction. That is, real world information integrationrequires the automatic acquisition of a conceptual knowledge base, i.e.,a conceptualization system.

[0013] In recent years, the task of automatic knowledge acquisition wasusually approached by corpus-based NLP. Free text documents were used asa source for learning different relations between words, e.g., bycontextual similarity.

SUMMARY OF THE INVENTION

[0014] The emergence of a global standard computer network, and morespecifically, the Internet, has led to the proliferation of classifiedon-line catalogs. This enables use of information navigation systems.One of the innovations of the present invention is the usage of theknowledge embedded in these very navigation systems as a new source forthe knowledge acquisition task in order to generate a so called unifiedclassification information graph. Information navigation systems, bytheir nature, imply hierarchy relations between categories, hence theyprovide more precise category-relations information then free text does.The categories and the hierarchy relations between categories isutilized in the process of generating the unified classificationinformation graph.

[0015] The present invention offers a solution to overcome thedifficulties in the usage of multiple resources so as to generate thedesired unified classification information graph. For example, the samepiece of information may be expressed in different word order or levelsof abstraction.

[0016] Since on-line catalogs are by nature subject to frequent (andoccasionally also major) changes—e.g., new products/categories are addedand/or others are deleted—it is important to assure that all or at leastmost of the modifications that occur in the on-line catalogs will bereflected in the resulting unified classification information graph.Accordingly, one of the important advantages of the system is thedynamic nature thereof, i.e., the ability to dynamically scan themultiple information resources and update, whenever required, theresulting unified information graph.

[0017] Thus the invention fulfills a long felt need by providing asystem and method for obtaining and integrating multiple classificationinformation resources using a single unified access interface.

[0018] One aspect of the invention provides for a method for dynamicallyobtaining a unified classification information graph that provides anavigation system for a user to access sought information. The methodincludes providing multiple information resources that include arespective hierarchy of categories each of which is associated with acategory; leaf categories in said hierarchy being connected toinformation pages. The method also includes generating a unifiedclassification information graph utilizing at least the hierarchy ofcategories and the categories of said multiple information resources;said unified classification graph includes a hierarchy of unifiedcategories; leaf unified categories in said hierarchy being connected toinformation pages. Information pages accessible through the hierarchy ofsaid multiple information resources are also accessible through thehierarchy of said unified classification information graph.

[0019] In one embodiment, the providing multiple information resourcesincludes providing at least some of the multiple information resourcesthat are located in sites of the Internet.

[0020] In another embodiment, the providing multiple informationresources includes providing at least some of the multiple informationresources that are located in databases.

[0021] In still another embodiment, the providing multiple informationresources includes providing at least some of the multiple informationresources that are located in an on-line catalog.

[0022] Still further, there is provided the step of associatingcategories in the hierarchy of categories in the multiple informationresources with hyperlinks.

[0023] Yet still further, there is provided the step of associatingcategories in the hierarchy of categories in the multiple informationresources with menus.

[0024] In one embodiment, the generating of a unified classificationinformation graph includes:

[0025] initializing so as to generate a respective “link graph” thatcorresponds to each information resource. The link graph includes linkgraph categories.

[0026] normalizing the link graph categories so as to generate aclassification graph that includes classification graph categories.

[0027] unifying the classification graph so as to generate the unifiedclassification information graph.

[0028] In this embodiment there is further provided the step ofproviding URL pointers of the on-line catalog for generating the linkgraph.

[0029] Another aspect of the invention provides for a machine having amemory that contains data representing a unified classificationinformation graph generated by the above method.

[0030] Still further, there is provided memory for storing dataaccessible by an application program, which program is accessed by auser through a user interface for the user to access sought information.The application program is executed on a data processing system. Thedata includes a data structure stored in the memory, the data structureincluding a unified classification information graph generated frommultiple information resources. The unified classification informationgraph includes a hierarchy of unified categories; leaf unifiedcategories in said hierarchy being connected to information pages.Information pages that are accessible through the multiple informationresources are also accessible through the hierarchy of the unifiedclassification information graph.

[0031] The invention further provides for a system for dynamicallyobtaining a unified classification information graph that provides anavigation system for a user to access sought information. The systemincludes an input device receiving multiple information resources thatinclude a respective hierarchy of categories each of which associatedwith a category. Leaf categories in the hierarchy are connected toinformation pages. The system also includes a generator, generating aunified classification information graph utilizing at least thehierarchy of categories and the categories of said multiple informationresources. The unified classification information graph includes ahierarchy of unified categories. Leaf unified categories in thehierarchy are connected to information pages. Information pagesaccessible through the hierarchy of the multiple information resourcesare also accessible through the hierarchy of said unified classificationinformation graph.

[0032] Another aspect of the invention provides for use with a unifiedclassification information graph generated by the above method, a methodfor retrieving information of interest. The method includes providing auser query, and identifying unified categories in the unifiedclassification information graph which substantially match said query.According to the latter embodiment there is further provided the step ofidentifying the at least one information page in the unifiedclassification information graph that is connected to the unifiedcategories.

[0033] Preferably, any information page that is connected to a leafunified category in the unified classification information graphcontains information that can be described by the unified categoryinformation of said unified leaf category. Unified category informationstands for the unified category of the leaf category and the unifiedcategories of all its ancestors in the hierarchy.

[0034] Still further, preferably, all the information pages in themultiple information resources that contain information that can bedescribed by the unified category information of said leaf unifiedcategory are connected to the latter.

BRIEF DESCRIPTION OF THE DRAWINGS:

[0035] In order to understand the invention and to see how it may becarried out in practice, a preferred embodiment will now be described,by way of non-limiting example only, with reference to the accompanyingdrawings, in which:

[0036]FIG. 1 is a block diagram illustrating an Internet electroniccommerce client-server environment;

[0037]FIG. 2 illustrates a schematic structure of an on-line catalogwithin an Internet site;

[0038]FIG. 3 is a flowchart illustrating a set of user query phasesteps, according to one embodiment of the invention; and

[0039] FIGS. 4A-4B are two schematic illustrations depicting an exampleof respective input and output learn phases.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT Introduction

[0040] The key elements of the method of the preferred embodiment willbe presented in terms of an electronic commerce application over anInternet client server environment in which the information integrationcapabilities of this invention are advantageous. Further environmentsand different configurations and the modifications they entail will bespecified below.

[0041] The application of Internet Electronic Commerce is based on theuse of electronic storefronts and on-line catalogs. These catalogs arebuilt specifically to enable customers to electronically browse insearch of goods. From a customer's perspective, the business interactionprocess of identifying the right products and services, locatingpotential suppliers, and closing a deal that provides the best value forthe money involves a great deal of repetitive browsing and tediouscomparison work.

[0042] The preferred embodiment described provides an informationintegration solution to buyers obtaining information from multipleproduct information resources using a single unified access interface.

Electronic Commerce Client Server System Over the Internet

[0043] Referring to FIG. 1, an Internet client-server system upon whichan embodiment of the present invention could be implemented is shown as100. An Internet client-server system 100 comprises a set of on-linecatalogs 110, an information integration server computer system 120, andan end-user browsing system 130.

[0044] The end-user browsing system 130 might be a personal computer, anetwork computer, a television with Internet operating device, or anyother system setup, as long as it enables the user to interact with theInternet via a standard browser-like mechanism. As a matter of default,it enables communication with any Internet World Wide Web site, thedisplay of standard Internet pages (the current standard is HTML), andthe selection of new pages by means of hyperlink selection and fulladdress (URL) specification.

[0045] A set of on-line catalogs 110 comprises, as an example, threeon-line catalogs 111, 112, 113. An information integration servercomputer system 120 comprises an input device 121, a processor 122, adisplay 123, a memory 124, and standard Internet server software. Theserver software manages the communication of the information integrationserver computer system 120 with the Internet.

[0046] The Internet is shown as a cloud 140. This cloud symbolizes theglobal inter-system communication done via standard Internet protocolsusing standard Internet infrastructure. The communication and contentdelivery standards such as http, HTML, and so forth, are not anessential part of the present invention.

[0047] The invention is preferably implemented as software, and isinstalled on the information integration server computer system 120.

[0048] 0.1. An Example of Internet On-line Catalog Logical Structure

[0049] Referring to FIG. 2, an Internet on-line catalog logicalstructure example is shown as 200. The Internet on-line catalog logicalstructure is represented by a mesh of nodes and edges. Each noderepresents an HTML page. Each edge represents an HTML hyperlink toanother HTML page. The HTML hyperlink is possibly, but not necessarily,of the basic form

<A HREF=“URL”>Text</A>

[0050] where “Text” is the text appearing on the user's browser as thehyperlink name and “URL” is the address of the page to be accessed whenthe user selects this hyperlink, e.g., using his input device. Also, thespecific hyperlink syntax is not an essential part of the presentinvention. Rather, the method could be easily adapted to each new WorldWide Web page and hyper link compatible model.

[0051] A basic assumption is that for each external request of an HTMLhyperlink, the Internet on-line catalog provides the relevant HTML page.

[0052] In one embodiment of the invention, it can readily be seen thatthe logical mesh of pages and links actually induces a graph datastructure. This induced graph is not created; rather, it can be thoughtof as a view into the on-line catalog structure. This induced graph willbe referred to as the “link graph,” and is defined below.

[0053] The invention provides a method to unify any number of linkgraphs from multiple, often remote information resources. The resultinggraph is termed a Unified Classification Information Graph (UCIG).

[0054] Integration of the link graph and its unification into the UCIG,together with a user's ability to use the UCIG information without lossof information recall or precision constitutes one variant of theinvention.

A Two-Phase Operation

[0055] The invention includes two main operational phases. The “learnphase” and the “user query phase.” In the first phase, hereinafter thelearn phase, according to the invention, each of the on-line catalogs inthe on-line catalog set 110 is accessed and a special representation ofits classification information is created. Then, all the on-line catalogrepresentations are unified into the single unified classificationgraph: the Unified Classification Information Graph (UCIG). The unifiedclassification graph also includes connections to the information pagesthemselves, located in the on-line catalogs. An essential part of thepresent invention is the automatic creation of the UnifiedClassification Information Graph.

[0056] In the second phase (FIG. 3), hereinafter the user query phase,the user uses his end-user system 130 to issue a query to theinformation integration server software about a required product orservice. Then, the information integration server software uses thestored unified classification to identify several relevant categories.Preferably, but not necessarily, the information integration serversoftware may present the user with these categories. The user may selecta subset of these categories as his final query. The informationintegration server software then accesses the relevant on-line catalogsand obtains the relevant product information pages. As an optional step,it may collate the information, and filter it according to filtersprepared at the learn phase. Finally, it sends the information to theend-user system 130 to be presented to the user.

[0057] 0.1.1 The Learn Phase

[0058] FIGS. 4A-4B depict an example of input and output of the learnphase. FIG. 4A illustrates three Link Graphs as derived from inputinformation resources and the resulting UCIG, which has been generatedby the algorithm described below.

[0059] As stipulated above, the resulting UCIG is connected to allinformation pages that are connected to the original link graphs.

[0060] Thus, the category “cables & connectors” of the input informationresources depicted in FIG. 4A is present in FIG. 4B in the UnifiedClassification Information Graph, albeit in slightly different form.Thus, the categories of the input virtually reside also in the output.

Step 1: Initialization

[0061] In this step, a subset of the induced link graph is createdaccording to the following rules:

[0062] 1. A node (Text, URL) is created if it was induced by a hyperlinkincluded in a categories page of an on-line catalog, and the URL pointsto either a categories page or a product information page within theon-line catalog (see “Page Type Identification” section below). The textis denoted by the link graph category.

[0063] 2. An edge is created if and only if both nodes of the edge arecreated.

[0064] 3. If any of the created nodes doesn't have more than oneoutgoing edge, i.e., if the node URL points only to a product page, thenit constitutes a “leaf node.” Otherwise, i.e., if the node URL points toa categories page, the node is a “non-leaf node,” denoted only as aparent node.

[0065] 4. An additional node is created for the on-line catalog rootpage Node (on-line catalog name, on-line catalog root URL). An edge isadded between the root and all the nodes representing HTML hyperlinkscontained in the on-line catalog categories root URL.

[0066] The resulting created graph is called the “link graph” (denotedby LinkGraph) and represents one embodiment of classification of theon-line catalog. According to one embodiment of the invention, thesystem administrator uses a formal graph notation to manually describethe graph - based on the creation guidelines. According to thisapproach, the notation is read and the corresponding graph datastructure is created. This process is well known in the literature andneeds no further explanation.

Step 2: Normalization

[0067] The next step in the learn phase involves the creation of what iscalled the “classification graph,” which is a directed graphrepresenting a normalized classification form in the sense that it obeyscertain classification description rules. The classification graph isdenoted by ClassificationGraph. The LinkGraph is kept for use in laterstages (the LinkGraph embodies the connection to each single on-linecatalog relevant page).

Step 2.1: ClassificationGraph Construction

[0068] 2.1.1. For Every Node in the LinkGraph, Generate a Node in theClassificationGraph That Contains the Following Information

[0069] 1. CIG Category—initialized by the same content as the generatingLinkGraph node <Text>field, represented as a list of phrases as definedby CIG category below.

[0070] 2. A list of LinkGraph “node connections” (defined below)initialized to a single connection to the original LinkGraph nodegenerating this ClassificationGraph node. A LinkGraph node connection iscomprised of:

[0071] a UCIG node;

[0072] a LinkGraph node;

[0073] a Positive Filter: a set of tokens that are derived from thematch between the UCIG node and its connected LinkGraph node; and

[0074] a Negative Filter: a set of tokens that are derived from themismatch between the UCIG node and its connected LinkGraph node.

[0075] A Node Connection goal is that a UCIG node should be connected toLinkGraph nodes that have access to product information matching theUCIG node category. If the LinkGraph node accesses product pages suchthat only part of them match the UCIG node category, then we usePositive and Negative filters in order to filter products, and retrieveonly the products relevant to the category. For example, a UCIG nodewith a category “586 laptop” will be connected—using a LinkGraph nodeconnection—to all LinkGraph nodes that contain products of this type.

[0076] 2.1.2 An Edge is Drawn Between Two ClassificationGraph Nodes Ifand Only If an Edge was Drawn Between Their Generating LinkGraph Nodes

Step 2.2: CIG Categories Transformations

[0077] The following is a typical yet not an exclusive embodiment of thenormalization process which can differ from graph to graph.

[0078] In this step, tokens in various forms are taken and manipulatedaccording to pre-set rules that can change from graph to graph. They areas follows:

[0079] 1. Transform letters from upper case to lower case.

[0080] 2. Transform all plural forms to singular forms, e.g., “tables”is transformed to “table.”

[0081] 3. Represent a category text as its individual phrases (asdefined below), e.g “extension cord, power cables” comprise of thephrases “extension cord” “,” and “power cables.” The phrase “extensioncord” comprises the tokens “extension” and “cord.” The phrase “powercables” comprises the tokens “power” and “cables.”

[0082] 4. Transform phrases designated as needing the separator “−”between them to the correct phrases, e.g., “on line” becomes “on-line.”

[0083] 5. Transform a string of letters to its string of letters synonymas defined in the knowledge base, e.g., the string “phone” istransformed to “telephone.”

[0084] 6. Transform “−and” to “and.”

[0085] 7. Insert a space after the last digit in a string comprising ofnumbers and text. e.g., “100 MHz” becomes “100 MHz.”

[0086] 8. Transform text of the form “X and Y accessories” to “Xaccessories and Y accessories”

[0087] 9. Transform text of the form “X accessories” to “X and Xaccessories.”

[0088] 10. If the category contains a number (by digits or text, e.g.,“two to four,” “17 inch”), then handle the intervals as described belowin the “Intervals” section.

Intervals

[0089] Given a sequence of tokens in the form of a single number or arange of numbers bounded by a minimum value and a maximum value, andoptionally a measurement unit, the goal it to represent the given tokensequence as an interval token, comprised of:

[0090] Minimum number (min);

[0091] Maximum number (max); and

[0092] Measurement, if given (unit).

[0093] Interval construction is as follows. Given a text that contains anumber, it is compared to predefined “interval templates” in order toconstruct its interval representation.

[0094] The following are several examples of interval templates:

[0095] 1. Two phrases comprised of numbers with an interval-separatortoken between them signifying a set of numbers such that the tokenbefore the interval separator is the lowest limit of the boundary andthe token after the separator is the highest limit. Examples ofinterval-separator tokens include “−”, “to”, “:”, and so forth.

[0096] 2. Any string which has one number token with a space before andafter with or without a set of tokens signifying measurements such asMHz, GB, ‘″’, cm, lb, and so forth, hereinafter termed a “unit,” and/ora set of tokens such as “and above,” or “and over,” and so forth.

EXAMPLE 1

[0097] Text: “17 to 19″”;

[0098] Interval-template: min interval-separator max unit;

[0099] Interval representation: min=17, max=19, unit=‘″’.

EXAMPLE 2

[0100] Text: “17 and up”;

[0101] Interval-template: {min and up};

[0102] Interval representation: min=17, max=MaxInt, unit=“”.

[0103] Those versed in the art will readily appreciate that somenormalization steps may be deleted, or modified, and others can beadded, all as required and appropriate, depending upon the particularapplication.

Definitions. e.g., That Apply to the Present Examples

[0104] Token: A string in the category text that is bounded by a spacecharacter or the category beginning or ending. E.g., the tokens of thecategory “computer 486/586” are: {computer, 486,/,586}.

[0105] Text-Separator: A token of the type, but not limited to: “and”,“or”, “&”, “/”,“,”, “−”.

[0106] Phrase: Any combination of one or more continuous tokensseparated by a text-separator, or the category beginning or ending,e.g., category “laser printer & plotter,” phrases: “laser printer,” “&,”“plotter.”

[0107] CIG Category: The series of phrases that is the output ofprocessing a LinkGraph category. For example, if the original LinkGraphcategory is “scanners & digital cameras,” then the output UCIG categorycomprises the following phrases: “scanners”,“&,” and “digital cameras.”

[0108] Throughout this document, the term “category” refers to a UnifiedClassification Information Graph category.

Special Tokens (Given in the Knowledge Base)

[0109] Neutral Tokens: Tokens that do not add information to thecategory, e.g., “system,” “product,” and “miscellaneous.”

[0110] Phrase Head Token: The token that is the head of the noun phrase,e.g., the head of “cable of printer” is “cable” while the head of“monitor connector” is “connector.”

[0111] Category Head Tokens: From each phrase in the category we takeits noun phrase head. This constitutes the category head tokens, e.g.,the category head tokens of “cables for printer and monitor connectors”are “cables” and “connectors.”

Step 3: Integration

[0112] As each new on-line catalog is learned, the output of theprevious steps creates a ClassificationGraph, which is a directedacyclic graph that represents a normalized classification of theclassification found in the on-line catalog. Also, theClassificationGraph nodes contain pointers to nodes of the LinkGraph.

[0113] If this is the very first on-line catalog that is being learned,then the ClassificationGraph becomes the Unified ClassificationInformation Graph (UCIG), also denoted byUnifiedClassificationInformationGraph that represents the cumulativeclassification knowledge that has been learned.

[0114] If this is not the first on-line catalog, then the next step inthe learn phase is to unify the newly generated ClassificationGraph,denoted by NewGraph with the existingUnifiedClassificationInformationGraph.

[0115] A. Integrate NewGraph into UCIG

[0116] To integrate a NewGraph into the UCIG,

[0117] A.1. Initialize NonHandledNodesQueue to a queue of all nodes inthe NewGraph, entered by their order in breadth first search (BFS)traversal on the NewGraph. Thus, a parent node will always be ahead ofits descendants in this queue.

[0118] A.2. Initialize HandledNodesQueue to an empty queue of nodes.

[0119] A.3. While the NonHandledNodesQueue is not empty:

[0120] A.3.1 NewNode=Top of NonHandledNodesQueue.

[0121] A.3.2 Integrate NewNode into the unified ClassificationGraph (theUnifiedClassificationInformationGraph further defined below).

[0122] A.3.3 Remove NewNode from the NonHandledNodesQueue, and add itinto the HandledNodesQueue.

[0123] A.4. Now the HandledNodesQueue contains all nodes from the UCIG,but in an opposite order than the initialized NonHandledNodesQueue, suchthat a child node is always ahead of its parent. Clean from the UCIG allnodes from the NewGraph that had children nodes in the NewGraph, suchthat all these children nodes are already integrated—by unify or by “addedge” as defined below—into the UCIG.

[0124] B. Integrate NewNode into the UCIG

[0125] In the description of the steps B, it is assumed by A.1 abovethat all ancestors of NewNode are already integrated.

[0126] B.1. Get candidates for NewNode:

[0127] B.1.1 Prepare RelevantTokens, the set of tokens that willgenerate candidates:

[0128] B.1.1.1 Denote RelevantTokens as all tokens from NewNode.

[0129] B.1.1.2 Remove from RelevantTokens irrelevant tokens, e.g.,remove neutral tokens, and remove tokens that already appear in one ofthe ancestors of the NewNode.

[0130] B.1.2 Prepare RelevantCandidates, the set of nodes from UCIG thatare candidates for integration with NewNode:

[0131] If the NewNode is the NewGraph root node,

[0132] then the RelevantCandidates are the root nodes of UCIG.

[0133] Else

[0134] Initialize RelevantCandidates to an empty set;

[0135] For each token in RelevantTokens, add all nodes in UCIGcontaining this token into RelevantCandidates (if not already there);

[0136] For every node in RelevantCandidates, if it does not contain oneof the category head tokens remove it from the RelevantCandidates.

[0137] B.2. Initializations:

[0138] Initialize NewNodeTotalNonMatchedPhrases to the set of allphrases in the NewNode. In the next section B.3, the set is updated suchthat only phrases that were not integrated in any way to the existingnodes in the UCIG, will remain.

[0139] Initialize CandidatesContainingNewNode to an empty set of nodes.

[0140] Initialize CandidatesContainedInNewNode to an empty set of nodes.

[0141] B.3 Handle candidates:

[0142] For each candidate in the RelevantCandidates:

[0143] B.3.1 Check Match Level from the NewNode to Candidate (seesection C below), and initialize the following:

[0144] NewNodeToCandidateMatchLevel,

[0145] NewNodeMatchedPhrases being phrases in the NewNode that match thecandidate, and

[0146] NewNodeNonMatchedPhrases being phrases in the NewNode that do notmatch the candidate.

[0147] B.3.2 Check Match Level from Candidate to NewNode (see steps C),and initialize the following:

[0148] CandidateToNewNodeMatchLevel.

[0149] CandidateMatchedPhrases.

[0150] CandidateNonMatchedPhrases.

[0151] B.3.4 Decide what to do according to the found match levels:

[0152] B.3.4.1 If (NewNodeToCandidateMatchLevel=FullMatch) AND(CandidateToNewNodeMatchLevel=FullMatch or PartialMatch),

[0153]  i.e., if NewNode is contained in Candidate, and Candidate isfully or partially contained in NewNode, as would occur, for example, ifNewNode=“cables” and Candidate=“cables and connectors,”

[0154] then

[0155] Unify NewNode into Candidate with empty filters (defined below).

[0156] Remove from NewNodeTotalNonMatchedPhrases all phrases that are inthe NewNodeMatchedPhrases.

[0157] B.3.4.2. If (NewNodeToCandidateMatchLevel=PartialMatch) AND(CandidateToNewNodeMatchLevel=FullMatch or PartialMatch),

[0158]  i.e., if NewNode is partially contained in Candidate, andCandidate is fully or partially contained in NewNode, as would occur,for example, if NewNode =“cables and connectors,” and Candidate=“cables,”

[0159] then

[0160] Unify NewNode into Candidate with filtersNewNodeMatchedPhrases,NewNodeNonMatchedPhrases,

[0161]  i.e., unify NewNode into Candidate; some of the new node phraseswere found in Candidate, and some were not. Put the found phrases aspositive filters, and the unfound phrases as negative filters. In theon-line search, the products connected to this category will be filteredaccording to the positive and negative filters, such that only productsthat match the filters will be shown in the search results.

[0162] Remove from NewNodeTotalNonMatchedPhrases all phrases that are inNewNodeMatchedPhrases.

[0163] B.3.4.3 If (NewNodeToCandidateMatchLevel=NoMatch) AND(CandidateToNewMatchLevel=FullMatch or PartialMatch),

[0164]  i.e., if Candidate is contained in NewNode and NewNode is notcontained in Candidate, as would occur in the example of NewNode=“laserprinter” and Candidate=“printer,”

[0165] then

[0166] Add Candidate to CandidatesContainedInNewNode.

[0167] B.3.4.4 If (NewNodeToCandidateMatchLevel=FullMatch orPartialMatch) AND (CandidateToNewNodeMatchLevel=NoMatch),

[0168]  i.e., if Candidate contains NewNode, and is not contained inNewNode, as would occur in the example of NewNode=“printer,” andCandidate=“laser printer,”

[0169] then

[0170] Add Candidate to CandidatesContainingNewNode.

[0171] B.4. If NewNodeTotalNonMatchedPhrases is not empty,

[0172]  i.e., not all phrases in the new node were unified intoCandidates,

[0173] B.4.1 Add edges from relevant Candidates to NewNode:

[0174] For each Candidate in CandidatesContainedInNew

[0175] Copy NewNode to NewNodeCopy.

[0176] Add edge from Candidate to NewNodeCopy (defined below).

[0177] Initialize NewNodeCopy category to {NewNode categoryphrases\NewNodeNonMatchedPhrases}.

[0178] If NewNode is a leaf node.

[0179] Initialize NewNodeCopy positive filter by NewNodeMatchedPhrases,and the negative filter by NewNodeNonMatchedPhrases.

[0180] Initialize NewNodeCopy parents to NewNode parents.

[0181] Remove from NewNodeTotalNonMatchedPhrases all phrases that are inNewNodeMatchedPhrases.

[0182] If NewNodeNonMatchedPhrases is empty,

[0183] Delete NewNode.

[0184] As an example, consider NewNode=“laser printer & ink plotters,”Candidate=“printer,” so the method adds an edge from the Candidate“printer” to the new node copy “laser printer,” and leaves the original“laser printer & ink plotters” as it is. It will be handled at step B.5.

[0185] B.4.2 Add edges from the NewNode to relevant Candidates:

[0186] For each Candidate in CandidatesContainingNewNode:

[0187] Add edge from NewNode to Candidate (defined below).

[0188] For example, consider NewNode=“laser printer & plotter,”Candidate=“color laser printer.” The method adds an edge from the newnode “laser printer & plotter” to the Candidate “color laser printer.”

[0189] Remove from NewNodeTotalNonMatchedPhrases all phrases that are inNewNodeMatchedPhrases.

[0190] B.5 If NewNodeTotalNonMatchedPhrases is not empty,

[0191]  i.e., there are phrases in NewNode that were not unified northat have an added edge from or to any Candidate,

[0192] then

[0193] Update NewNode as follows:

[0194] B.5.1 Initialize its category to the concatenation of phrases inNewNodeTotalNonMatchedPhrases.

[0195] B.5.2 If NewNode is a leaf:

[0196] then

[0197] Initialize NewNode positive filters byNewNodeTotalNonMatchedPhrases.

[0198] Initialize NewNode negative filters by {The original set ofNewNode phrases\NewNodeTotalNonMatchedPhrases}.

[0199] Else, i.e., if it is a parent,

[0200] Remove from NewNode children nodes which have an additionalparent, since it means that the additional parent is a Candidate thatthe NewNode was unified into, hence added its child nodes to it.

[0201] B.5.3 The NewNode parents are not changed.

An Example

[0202] Suppose NewNode=“fax and modem,” Candidate=“modem,” and noCandidate in the UCIG contains the token “fax.” Then

[0203] NewNode “fax and modem” is unified into the Candidate node“modem” with positive filters “modem” and negative filters “fax.”

[0204] NewNode name is modified to “fax,” and the following filters areadded to its LinkGraph node: positive filters: “fax”; negative filters:“modem.”

[0205] C. Check Match Level from Node N1 to Node N2

[0206] “Ni-Phrases” means the set of phrases of node Ni and itsancestors.

[0207] Throughout the following definitions, we ignore tokens thatappear in phrases of Ni-Phrases, if the knowledge base indicates thatthey should be ignored. For example, we ignore the following tokens:

[0208] Neutral tokens, that do not add information to the node, e.g.,“product,” “miscellaneous.”

[0209] Tokens of ancestor nodes that represent a department of producttypes, such as “hardware,” “office equipment,” “kitchen accessories.”

[0210] Tokens of ancestor nodes of N1 that are on the same semanticfamily (according to the knowledge base) as tokens in N2-phrases, e.g.,if N1=storage—disk, and N2=disk, then “storage” is ignored, if in theknowledge base it appears as a token that is in the same semantic familyof “disk.”

[0211] A token match level to node Ni is:

[0212] FullMatch if the token is included in Ni-Phrases.

[0213] NoMatch if the token is not included in Ni-Phrases.

[0214] An interval token, IT, match level to node Ni is:

[0215] FullMatch if Ni-Phrases contain a phrase that contains aninterval token, ITi, such that IT interval boundaries are contained inITi interval boundaries, i.e., ITi-min <=IT-min <=IT-max <=ITi-max.

[0216] PartialMatch if no FullMatch, and Ni-Phrases contain a phrasethat contains an interval token, ITi, such that ITi has an overlap withIT, i.e., ITi-min <=IT-min <=ITi-max <=IT-max, or IT-min <=ITi-min<=IT-max <=ITi-max.

[0217] NoMatch if Ni-Phrases does not contain any phrase that containsan interval token, or that every interval token that is included inNi-Phrases does not overlap with the given IT interval, i.e., for everyinterval token ITi in Ni-Phrases: ITi-min <=ITi-max <=IT-min <=IT-max,or IT-min <=IT-max <=ITi-min <=ITi-max.

[0218] In addition, we demand that the unit measurements of comparedintervals will not contradict.

[0219] A phrase match level to node Ni is:

[0220] FullMatch if the match level of every token in the phrase toNi-Phrases isFullMatch.

[0221] PartialMatch if not FullMatch, and every regular token hasFullMatch to Ni-Phrases, and every interval token has FullMatch orPartialMatch to Ni-Phrases.

[0222] NoMatch if there exists a token with NoMatch match level toNi-Phrases.

[0223] As example, the phrase “color printer” has FullMatch toNi-phrases={printer, color laserjet}, while the phrase “14 to 16 inch”has PartialMatch to Ni-phrases={monitor, 12-15}, and FullMatch toNi-Phrases={color monitor, 12 and up} since the interval “12 and up” isrepresented by min=12, max=MaxInt.

[0224] A category match level to node Ni is:

[0225] FullMatch if every phrase in the category has FullMatch to Ni.

[0226] PartialMatch if there exists a phrase in the category that has

[0227] PartialMatch or FullMatch to Ni, and there exists a phrase in thecategory that has NoMatch to Ni.

[0228] NoMatch if no phrase in the category is included in Ni.

[0229] Consider as an example, Category=“color printer and plotter” andNi-phrases={printer, color laserjet}. In this case, the match level isPartialMatch, since one of the category phrases, “color printer,” isincluded in the Ni-phrases, while “plotter” is not included in theNi-Phrases.

Check Match from N1 to N2

[0230] The Match Level from node N1 to node N2 is:

[0231] FullMatch if N1 category match level to N2 is FullMatch, andevery ancestor category of N1 match level to N2 is FullMatch orPartialMatch.

[0232] PartialMatch if N1 category match level to N2 is PartialMatch,and every ancestor category of Ni match level to N2 is FullMatch orPartialMatch.

[0233] NoMatch if there exists a category from N1 or its ancestors suchthat its match level to N2 is NoMatch.

[0234] N1-MatchedPhrases are phrases from N1 category that are includedin N2.

[0235] N1-NonMatchedPhrases are phrases from N1 category that are notincluded in N2.

[0236] Consider as an example,

[0237] N1=“printer & plotter”->“color”->“laserjet”;

[0238] N2=“printer”->“color”->“laserjet & inkjet”;

[0239] N3=“laserjet printer”->“8 pin”->“color”; and

[0240] N3-phrases={laserjet printer, 8 pin, color}.

[0241] The match level from N1 to N3 is FullMatch, since

[0242] The match level of “laserjet” to N2 is FullMatch.

[0243] The match level of the category “color” to N2 is FullMatch.

[0244] The match level of the category “printer & plotter” to N2 isPartialMatch.

[0245] N1 category has FullMatch to N3, and its ancestor categories haveFullMatch or PartialMatch to N3.

[0246] N1-MatchedPhrases={laserjet}.

[0247] N1-NonMatchedPhrases={}.

[0248] Match Level from N2 to N3 is PartialMatch, since

[0249] The match level of “laserjet & inkjet” to N2 is PartialMatch.

[0250] The match level of the category “color” to N2 is FullMatch.

[0251] The match level of the category “printer” to N2 is FullMatch.

[0252] N2 category has PartialMatch to N3, and its ancestor categorieshave FullMatch or PartialMatch to N3.

[0253] N2-MatchedPhrases={laserjet}.

[0254] N2-NonMatchedPhrases={inkjet}.

[0255] Note:

[0256] The root is fully included in every node (since it has notokens).

[0257] Match level is Asymetric, it is possible that N1 match level toN2 is full, and N2 to N1 match level is none, e.g., N1=printer, N2=colorprinter.

Unify a CIG NewNode into a UCIG Candidate

[0258] Given:

[0259] NewNode;

[0260] Candidate;

[0261] MatchedPhrases; and

[0262] NonMatchedPhrases.

[0263] Process:

[0264] 1. If NewNode is a leaf and Candidate is a parent:

[0265] Add NewNode LinkGraph connections to Candidate Unknown child(defined below) connections (if the candidate does not have an unknownchild, create it).

[0266] 2. If NewNode is a leaf and Candidate is a leaf:

[0267] Add the LinkGraph connections of NewNode to Candidate nodeconnections.

[0268] 3. If NewNode is a parent and Candidate is a parent:

[0269] Add NewNode children that match the MatchedPhrases and do notmatch the NonMatchedPhrases to the Candidate children.

[0270] 4. If NewNode is a parent and Candidate is a leaf:

[0271] Add NewNode children that match the matchedphrases and do notmatch the NonMatchedPhrases as children of Candidate.

[0272] Create unknown child to Candidate.

[0273] Move the Candidate LinkGraph connections to its unknown child(defined below) connections.

Unknown Child

[0274] An unknown child holds connections to LinkGraph nodes that arenot known to be related to any of the sibling nodes of the unknownchild. There is at most one unknown child to any parent node. Thisunknown child is invisible to the user. See the user query (“QueryPhase”) section to see how unknown children are handled given the userquery.

[0275] As an example, suppose “printer” was mentioned on catalog c I ona link pointing to a categories page with the categories “laserjetprinter” and “inkjet printer.” On catalog c2, “printer” was pointing toa product page. Then when the user asks for printer, he will bepresented with “printer,” with children “laserjet printer” and “inkjetprinter” connected to c1 LinkGraph connections. The connection tocatalog c2 product page of printers, could not be hanged on one of thesechildren, since it holds unclassified printers that are not known tobelong to “laserjet” or “inkjet” or any other printer. So, we add anunknown child (invisible to the user), and connect it to c2 printerproduct page.

[0276] Add an edge from node N1 to node N2

[0277] If N1 is a parent node,

[0278] Then

[0279] Add an edge from N1 to N2.

[0280] Else, if N1 is a leaf node,

[0281] Then

[0282] Add an edge from N1 to N2.

[0283] Add an unknown child to N1.

[0284] Move N1 LinkGraph connections to its unknown child LinkGraphConnections.

Step 4: Periodic Update of Integration

[0285] This is as follows:

[0286] Given:

[0287] UCIG;

[0288] An old LinkGraph connected to UCIG; and

[0289] A new LinkGraph constructed from the same resource, where part ofthe new LinkGraph is identical to the old LinkGraph, part of it is newlyadded, and part of the old LinkGraph does not exist any more in the newLinkGraph,

[0290] the goal is to reintegrate the LinkGraph into the UCIG to reflectthe changes that were made to the on-line catalog.

[0291] The process includes:

[0292] 1. Update the Matched Graph under the new LinkGraph root and oldLinkGraph root (as defined below). This is a recursive function, thatfor each node in the new LinkGraph that is found in the old LinkGraphthat is connected to the unified graph, updates the URLs in the unifiedgraph, and deletes it from the new graph.

[0293] 2. Integrate the new LinkGraph into the UCIG as defined in thelearn phase. This will integrate only the new nodes that appear in thenew LinkGraph, and not in the old LinkGraph. Since the nodes which didnot change between the LinkGraphs were deleted at the previous step 1.

[0294] 3. Remove from the UCIG all connections to old LinkGraph nodes.

[0295] 4. Clean from the UCIG any leaf nodes that are not connected toany LinkGraph nodes. Those nodes were connected to the old LinkGraphnodes that do not exist any more in the new LinkGraph nodes.

Update the Matched Graph (NewNode, OldNode)

[0296] The following is a description of an update function, whereNewNode is a node in the new LinkGraph and OldNode is a node in the oldLinkGraph. The function returns true if and only if the new node and itsdescendents are already connected to the UCIG.

[0297] 1. Find the UCIG nodes that are connected to OldNode.

[0298] 2. If found such UCIG nodes,

[0299] Then

[0300] Update in their LinkGraph connection the URL of NewNode insteadof the URL of OldNode.

[0301] 3. If NewNode is a leaf,

[0302] Then if found connection,

[0303] Return true.

[0304] Else

[0305] Return false.

[0306] In the following step (4), for a parent NewNode, the funtionrecursively checks: if all the parent's children are in the unifiedgraph, then return true. NewNode will be deleted by its calling function4. If NewNode is a parent (non-leaf), For each NewChild of NewNode, Foreach OldChild of OldNode, If (check match level from NewChild toOldChild is FullMatch) AND (check match level from OldChild to NewChildis FullMatch)), If (Update Matched Graph (NewChild, OldChild)), DeleteNewChild. If all NewChild were deleted, Return true. Else Return false.

Step 5: The Query Phase

[0307] Referring to FIG. 3, the steps of query phase include:

[0308] Step 301: The user uses his end-user system to issue a query tothe information integration server software about a required product orservice. The query is a free text description of the target producttype.

[0309] Step 302: The normalization process of the learn phase issubjected to the user's string as if it where a node's category. Theresult is a CIG node representation of the user query, denoted byQueryNode.

[0310] Step 303: Apply the get candidates process for QueryNode asdescribed above in step B.1.

[0311] Filter candidates that do not contain the QueryNode head tokens.

[0312] For each candidate in the list:

[0313] 1. If (match level from candidate to QueryNode is FullMatch orPartialMatch) AND (match level from QueryNode to candidate is FullMatchor PartialMatch),

[0314] Then

[0315] Add the candidate to FullMatchedCandidates.

[0316] 2. If (match level from candidate to QueryNode is FullMatch orPartial Match) AND (match level from NewNode to candidate is NoMatch),

[0317] Then

[0318] Add the candidate to QueryContainingCandidates.

[0319] 3. If (match level from QueryNode to candidate is NoMatch) AND(match level from candidate to QueryNode is FullMatch or PartialMatch),

[0320] Then

[0321] Add the candidate to QueryIsContainedInCandidates.

[0322] Note: We are now left with 3 groups, FullMatchedCandidates,QueryContainingCandidates, and QueryIsContainedInCandidates.

[0323] If FullMatchedCandidates is not empty,

[0324] Show the user the FullMatchedCandidates, and skip to step 304.

[0325] If QueryContainingCandidates is not empty,

[0326] Show the user the QueryContainingCandidates, and skip to step304.

[0327] Otherwise

[0328] Show the user the QueryIsContainedInCandidates.

[0329] Step 304: This step enables the user to refine the node selectedby step 303. All nodes from the UCIG that are in MatchedCandidates,accepted by step 303, are presented. The user may select any node, ormay further expand it to select one of its child nodes. Nodes of type“unknown nodes” are not shown to the user. Those are internal nodes,used in step 305. The process ends when the user finishes selectingnodes.

[0330] Step 305: For each node in the final list, and for each LinkGraphnode of respective node, and from each leaf node accessible from arespective LinkGraph, a product page URL contained in said leaf, isobtained. If the user selects a node which has a sibling node of type“unknown node,” then this “unknown node” is automatically selected, andfor its connections dynamic positive filters containing all phrases fromselected nodes are generated.

[0331] Step 306: If the URL (looked for in step 305 above) does notexist, an evaluation step commences at the route of links leading fromthe catalog beginning to the desired product page: (L1, L2, . . . ,Ln),such that L1 leads to the first categories page of the catalog, and Lnis the selected category link leading to a product page. If Ln points toa page that is no longer valid, go back to Li, such that Li is the firstlink from the end with a valid URL. Advance incrementally from Li to Ln,by bringing the page pointed to by Li, and looking for Li+1 link in thepage, until the new URL address for Ln is found. The route from L1 to Lnwas prepared off-line in the LinkGraph generation, thus it is possiblethat in an on-line access one may find that this route does not appearas is in the catalog, because of changes in the categories structure ortext. The following changes may be handled:

[0332] If a link name is not found in a page where it is expected to befound, an equivalent link name is sought, where equivalency is set byapplying the rules as in the learn phase.

[0333] If Ln used to be a leaf category, pointing to a product page, andin on-line access it is found that it points to another categories page,then all the product pages under the Ln are taken recursively.

[0334] If any Lj is not found (product type is not sold any more, linkswere added after a link that used to be a leaf, other types of pageswere added) then this on-line catalog is not handled on-line, but ratheran off-line update process is notified as to which information should bereprocessed.

[0335] Step 307: For each product page obtained, the productsinformation in the page is extracted. Then, for each user-requestphrase, it aggregates all the products information into an orderedtable.

[0336] Step 308: Send the results (raw pages or tables) to the end-usersystem.

Page Type Identification

[0337] Each catalog contains a number of page types, e.g.,

[0338] Categories page: a page used for browsing the catalog, containinga list of product categories/types/properties. The catalog may containmany categories pages, all in the same structure. The categories pagemay contain more information in addition to the categories list, e.g.,links to other places in the site or outside of the catalog.

[0339] Products page: a page with a list of products. Each product hasits own description (e.g., product name, part number, manufacturer,price . . . ). The products page may contain additional informationbeyond the list of products.

[0340] Other pages, e.g., search form of the catalog, are ignored at thelearn phase.

[0341] The categories page and products page are constructed of the samebuilding blocks and expressions. Thus, it is assumed that the followingis given:

[0342] Categories page regular expression—a regular expressionrepresenting the categories page structure such that a page matches theregular expression if and only if it is a categories page.

[0343] Products page regular expression: a regular expressionrepresenting any products page such that a page matches the regularexpression if and only if it is a products page.

[0344] For each page, it is matched to the categories page regularexpression. If there is a match, the page type is “Categories page.”Else, the page is matched to the products page regular expression. Ifthere is a match, the page type is “Products page.” Else its type is“Other.”

Further Environments and Different Configurations

[0345] The preferred embodiment has been described in terms of anelectronic commerce application over the Internet infrastructure.However, the main novelty—but not the only one—resides in constructing aunified category classification out of heterogeneous classification.

[0346] The underlying concept of the invention can be adapted to otherglobal network environments. Thus, for example, in another embodiment,the basic inter-communication model of hyperlinks is maintained and theon-line catalog is constructed as a graph of classification links—amongother links—and information pages. In this example, such a changeaffects only the LinkGraph formalities since the basic equivalencebetween the on-line catalog and the LinkGraph is kept.

[0347] A non client server, e.g., only one computer. The samegeneralization applies to the Client-Server environment. The underlyingconcept of the invention could likewise be easily adapted to be appliedwith a single computer which performs the learn-phase on the on-linecatalogs, and stores the LinkGraph and classification graph locally.Then, the user accesses the user query phase via the very same computer.

[0348] Those versed in the art will readily appreciate the variousdescribed learn-phase and query-phase operations are only some out ofmany possible variants to obtain the unified ClassificationGraph in themanner specified. Accordingly, rules and parameters that appear in thespecified steps may be modified, added or deleted all as required andappropriate depending upon the particular application. The same appliesto the steps which pertain to the user query phase.

[0349] All English words mentioned in this document are not part of thealgorithm. Rather, they are given as examples to the understanding ofthe method. Analogous words in a different language could be used. Thus,the method is not restricted to English.

[0350] The present invention has been described with a certain degree ofparticularity, but those versed in the art will readily appreciate thatvarious alterations and modifications may be carried out withoutdeparting from the spirit of the invention as defined in the claims.

What is claimed is:
 1. A method for dynamically obtaining a unifiedclassification information graph (UCIG), the UCIG providing a navigationsystem for a user to access sought information, the method comprisingthe steps of: (a) providing a plurality of information resources not allnecessarily having the same conceptualization system, each resourceincluding a respective hierarchy of categories, leaf categories in eachhierarchy being connected to information pages; and (b) generating aUGIC by carrying out knowledge acquisition tasks using at least thehierarchy of categories and the categories of the information resources,the UCIG including a unified hierarchy of categories, leaf categories inthe unified hierarchy of categories being connected to information pagesof the plurality of information resources, such that information pagesaccessible through the hierarchy of the plurality of informationresources are also accessible through the unified hierarchy ofcategories of the UCIG.
 2. A method as recited in claim 1, wherein atleast some of the provided information resources are located in WorldWide Web sites on the Internet.
 3. A method as recited in claim 1,wherein at least some of the provided information resources are locatedin databases.
 4. A method as recited in claim 1, wherein at least someof the provided information resources are located in on-line catalogs.5. A method as recited in claim 1, further comprising associatingcategories in the hierarchy of categories in the plurality ofinformation resources with hyperlinks.
 6. A method as recited in claim1, further comprising associating categories in the hierarchy ofcategories in the plurality of information resources with menus.
 7. Amethod as recited in claim 1, wherein the step of providing theplurality of information resources further includes: (i) initializing togenerate for each information resource a link graph that corresponds tothe information resource, each link graphs including one or more linkgraph categories; (ii) normalizing the one or more link graph categoriesof each link graph to generate a classification graph for theinformation resource that includes classification graph categories; and(iii) unifying the classification graphs to generate the UCIG.
 8. Amethod according to claim 7, further comprising providing a URL pointerof an on-line catalog for generating the link graph.
 9. A machine havinga memory containing data representing a unified classificationinformation graph that was generated by the method of claim
 1. 10. Amemory storing data for access by an application program, theapplication program accessible to a user through a user interface forthe user to access sought information, the application program beingexecuted on a data processing system, the data comprising: a datastructure including a unified classification information graph generatedby carrying out knowledge acquisition tasks from a plurality ofinformation resources not necessarily all having the sameconceptualization system, the plurality of information resourcesproviding access to information pages, the unified classification graphincluding a unified hierarchy of categories, leaf categories in theunified hierarchy being connected to information pages of theinformation resources, such that information pages accessible throughthe information resources are also accessible through the hierarchy ofthe unified classification information graph.
 11. A system fordynamically obtaining a unified classification information graph thatprovides a navigation system for a user to access sought information,the system comprising: an input device receiving a plurality ofinformation resources not all necessarily having the sameconceptualization system, the information resources each including arespective hierarchy of categories, leaf categories in the hierarchybeing connected to information pages; and a generator to generate aunified classification information graph by carrying out knowledgeacquisition tasks utilizing at least the hierarchy of categories and thecategories of the information resources, the unified classificationinformation graph including a unified hierarchy of categories, leafcategories in the unified hierarchy of categories being connected toinformation pages of the plurality of information resources, such thatinformation pages accessible through the hierarchy of the informationresources are also accessible through the unified hierarchy of theunified classification information graph.
 12. A system as recited inclaim 11, wherein at least some of the information resources are locatedin sites of the Internet.
 13. A system as recited in claim 11, whereinat least some of the information resources are located in databases. 14.A system as recited in claim 11, wherein at least some of theinformation resources are located in on-line catalogs.
 15. A system asrecited in claim 11, wherein categories in the hierarchy of categoriesin the information resources are associated with hyperlinks.
 16. Asystem as recited in claim 11, wherein categories in the hierarchy ofcategories in the information resources are associated with menus.
 17. Asystem as recited in claim 11, wherein the generator includes: aninitialization unit for generating a respective link graph correspondingto each information resource, each link graph including link graphcategories; a normalization unit for normalizing the link graphcategories to generate for each link graph a correspondingclassification graph that includes classification graph categories; anda unifying unit for unifying the classification graphs to generate theunified classification information graph.
 18. A system according toclaim 17, wherein generating one of the link graphs includes providing aURL pointer of an on-line catalog.
 19. A method for retrievinginformation of interest using a unified classification information graph(UCIG), the UCIG providing a navigation system for a user to accesssought information, the UCIG generated by a process including: providinga plurality of information resources not necessarily having the sameconceptualization system, each resource including a respective hierarchyof categories, leaf categories in each hierarchy being connected toinformation pages; and generating a UGIC by carrying out knowledgeacquisition tasks using at least the hierarchy of categories and thecategories of the information resources, the UCIG including a unifiedhierarchy of categories, leaf categories in the unified hierarchy ofcategories being connected to information pages of the plurality ofinformation resources, such that information pages accessible throughthe hierarchy of the plurality of information resources are alsoaccessible through the unified hierarchy of categories of the UCIG, themethod comprising: providing a user query; and identifying unifiedcategories in the unified classification information graph whichsubstantially match the query.
 20. A method as recited in claim 19,further comprising the step of identifying the at least one informationpage in the unified classification information graph that is connectedto the categories of the unified graph.
 21. A system for retrievinginformation of interest using a unified classification information graph(UCIG), the UCIG generated by a process including: providing a pluralityof information resources not necessarily having the sameconceptualization system, each resource including a respective hierarchyof categories, leaf categories in each hierarchy being connected toinformation pages; and generating a UGIC by carrying out knowledgeacquisition tasks using at least the hierarchy of categories and thecategories of the information resources, the UCIG including a unifiedhierarchy of categories, leaf categories in the unified hierarchy ofcategories being connected to information pages of the plurality ofinformation resources, such that information pages accessible throughthe hierarchy of the plurality of information resources are alsoaccessible through the unified hierarchy of categories of the UCIG, thesystem comprising: an interface for receiving a user query; and anidentifier identifying unified categories in the unified classificationinformation graph which substantially match the query.
 22. A system asrecited in claim 21, wherein in the case of the identifier identifying aparticular category that substantially matches the query and that is aleaf category attached to one or more particular information pages, theidentifier also identifies the one or more particular information pages.23. A method for dynamically updating a provided unified hierarchy ofcategories representable by a unified classification information graph(UCIG), leaf categories in the unified hierarchy of categories connectedto information pages of a first plurality of information pages, the UCIGproviding a navigation system for a user to access sought information,the method comprising the steps of: providing one or more additionalinformation resources that each includes a hierarchy of categories, allthe additional information resources not necessarily having the sameconceptualization system, leaf categories in the hierarchy of any of theadditional information resources being connected to information pages ofa second plurality of information pages; providing the unifiedclassification graph of the unified hierarchy of categories; andgenerating an updated unified classification information graph bycarrying out knowledge acquisition tasks utilizing at least thehierarchy of categories and the categories of the additional informationresources, the updated unified classification graph including theprovided unified hierarchy of categories, leaf categories in the updatedunified classification graph being connected to information pages of thefirst and second plurality of information pages; such that informationpages accessible through the hierarchy of the additional informationresources are also accessible through the updated unified classificationinformation graph.
 24. A method as recited in claim 23, wherein at leastsome of the provided additional information resources are located insites of the Internet.
 25. A method as recited in claim 23, wherein atleast some of the provided additional information resources are locatedin databases.
 26. A method as recited in claim 23, wherein at least someof the provided additional information resources are located in on-linecatalogs.
 27. A method as recited in claim 23, further comprisingassociating categories in the hierarchy of categories in the providedadditional information resources with hyperlinks.
 28. A method asrecited in claim 23, further comprising associating categories in thehierarchy of categories in the provided additional information resourceswith menus.
 29. A method as recited in claim 23, wherein generating theupdated unified classification information graph includes: (i)initializing to generate for each provided additional informationresource a link graph that corresponds to the information resource, eachlink graph including link graph categories; (ii) normalizing the linkgraph categories of each link graph to generate one or moreclassification graphs that correspond to the one or more link graphs andthat each includes classification graph categories; and (iii) unifyingthe one or more classification graphs with the provided unifiedclassification information graph to generate the updated unifiedclassification information graph.
 30. A method according to claim 29,further comprising providing a URL pointer of an on-line catalog forgenerating the link graph.
 31. A carrier medium carrying one or morecomputer readable code segments to cause the one or more processors of acomputer system to dynamically generate a unified classificationinformation graph (UCIG), the UCIG providing a navigation system for auser to access sought information, the carrier medium comprising: for aprovided plurality of information resources, the information resourcesnot necessarily having the same conceptualization system, each resourceincluding a respective hierarchy of categories, leaf categories in eachhierarchy being connected to information pages, code to cause the one ormore processors to generating a UGIC by carrying out knowledgeacquisition tasks using at least the hierarchy of categories and thecategories of the UCIG including a unified hierarchy of categories, leafcategories in the unified hierarchy of categories being connected toinformation pages of the plurality of information resources, such thatinformation pages accessible through the hierarchy of the plurality ofinformation resources are also accessible through the unified hierarchyof categories of the UCIG.
 32. A carrier medium as recited in claim 31,wherein at least some of the provided information resources are locatedin World Wide Web sites on the Internet.
 33. A carrier medium as recitedin claim 31, wherein at least some of the provided information resourcesare located in databases.
 34. A carrier medium as recited in claim 31,wherein at least some of the provided information resources are locatedin on-line catalogs.
 35. A carrier medium as recited in claim 31,wherein categories in the hierarchy of categories in the plurality ofinformation resources are associated with hyperlinks.
 36. A carriermedium as recited in claim 31, wherein categories in the hierarchy ofcategories in the plurality of information resources are associated withmenus.
 37. A carrier medium as recited in claim 31, further comprising:wherein the step of providing the plurality of information resourcesfurther includes: code to cause the one or more processors to generatefor each information resource a link graph that corresponds to theinformation resource, each link graph including one or more link graphcategories; code to cause the one or more processors to normalize theone or more link graph categories of each link graph to generate foreach link graph a classification graphs that includes classificationgraph categories; and code to cause the one or more processors to unifythe classification graphs to generate the UCIG.
 38. A carrier mediumaccording to claim 37, wherein the generating of the link graph uses aprovided URL pointer of an on-line catalog.
 39. A method for dynamicallyupdating a provided unified hierarchy of categories representable by aunified classification information graph (UCIG), leaf categories in theunified hierarchy of categories connected to information pages of afirst plurality of information pages, the UCIG providing a navigationsystem for a user to access sought information, the method comprisingthe steps of: providing one or more additional information resourcesthat each includes a hierarchy of categories, all the additionalinformation resources not necessarily having the same conceptualizationsystem, leaf categories in the hierarchy of any of the additionalinformation resources being connected to information pages of a secondplurality of information pages; providing the unified classificationgraph of the unified hierarchy of categories; and generating an updatedunified classification information graph by carrying out knowledgeacquisition tasks utilizing at least the hierarchy of categories and thecategories of the additional information resources, the updated unifiedclassification graph including an updated unified hierarchy ofcategories, leaf categories in the updated unified hierarchy ofcategories being connected to information pages of the first and secondplurality of information pages; such that information pages accessiblethrough the hierarchy of the additional information resources are alsoaccessible through the updated unified hierarchy of the updated unifiedclassification information graph.
 40. A method as recited in claim 39,wherein generating the updated unified classification information graphincludes: (i) initializing to generate for each provided additionalinformation resource a link graph that corresponds to the informationresource, each link graph including link graph categories; (ii)normalizing the link graph categories of each link graph to generate oneor more classification graphs that correspond to the one or more linkgraphs and that each includes classification graph categories; and (iii)unifying the one or more classification graphs with the provided unifiedclassification information graph to generate the updated unifiedclassification information graph.